Data

For our primary dataset on Chinese debt, we used a dataset assembled by Horn, Reinhardt and Trebesch. Fortunately, this dataset was exceedingly clean and well-documented. Conversely, for our dependent variable in our analysis, we created a brand new dataset using text analysis and regex expressions of the matrices of recommendations from the UN Universal Periodic Reviews of China for 2013 and 2018. This was an extensive data cleaning task involving large amounts of regex, reshaping, string, manipulation, and joining. You can find all of this in the text_analysis.R file in our repo.

As an exploration of how chinese development finance evolved through time, we can look at the following graph. We see that initially Asia was the main recipient of chinese finance, but Africa took off after 2008.

## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

Overall, the global debt to chine has been rising in the past 15 years. Nonetheless, its average growth rate has slowed down dramatically:

At the same time, it is worth noting that the chinese investment in Africa stands out when we consider China’s trade evolution through time. Naturally, the region in which China trades most is Asia.

## `summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.

Here we can observe the geographical evolution of both trade and debt to China.

But what is the composition of Chinese finance? In the following interactive chart, we can observe the total amount of USD commited in development projects per type:

At the same time, we can see where are each of the projects going. In this graph, we get a sense of the share of finance each region receives per sector:

Text Analysis

The text analysis here can be divided into two parts. One, as mentioned above, relates to creating a dataframe out of analysis of word documents and breaking a word table into its constituent parts. The other part consisted of text sentiment analysis of the two years of recommendations. Using techniques learned in class (lemmatization, a word cloud, bag of words, and basic sentiment analysis) we were able to see that the tone of the documents changed from 2013 to 2018 to be more positive. This guided the rest of our research.

In 2013, negative comments clearly dominate the discourse:

## <environment: R_GlobalEnv>

But in 2018, this changed dramatically:

Namely, in pulling data from the text, we decided to use information on whether China had accepted or rejected a particular recommendation as a proxy for the recommending country’s political attitudes towards China. We created variables for each country’s positive and negative comments in each of the two reviews, and also an overall “support score” based on the difference of these two fields. This dataframe was created in China_analysis.R. We used this document throughout the rest of the project.

As an interesting side-project, we analyzed which categories of recommendations were most likely to be rejected or accepted by China and depicted them in an interactive tree-graph with a D3 interface within R. You can find this in the repo as interactiveTreemap.html (and you can see the code behind it in china_analysis.R).

Statistical Analysis

The primary file for this section is statistical_analysis.R. This stage involved some data transformation in order to get it into a usable “long” format. We also at this stage imported several more variables from World Development Indicators and the Economist Intelligence unit to use as controls.

We did some initial analysis to try and understand the relationship between our variables. For example, we performed a Hodrick-Prescott filter, made a correlation matrix of our x-values, and produced working graphs and a table to look at the distribution of our debt to china variable using quantiles/boxplots per year.

Initially, we used the “balance” of sentiment from all comments a country made in 2013 and in 2018 as our dependent variable (yit) and ran a normal OLS and then a diff and diff model with the following specifications:

\[{Y}_{it}=\alpha + treat + time + treat*time + X_{i}\]

Where time is a dummy variable for whether we are looking in the first UPR (2013) or the second (2018), and treat is a continuous variable of debt to China as percentage of GDP. Note: We used the debt in years 2012 and 2017 since the UPR takes place in January. Using the lag of this variable also helps address reverse causality concerns. We ran the diff and diff with country fixed effects in lieu of the control vector, but it removed too much of the variation. Not only were the results insignificant, but the coefficients were negative (see table 1). This was not the direction we were expecting, but gave credence to the alternate theory that perhaps countries with higher debt levels simply make fewer comments overall (i.e. the effect is on how many, not the type, of comments). This could be true if countries were uncertain how a certain comment might be received. With this in mind, we ran both a simple OLS and a diff and diff regression with the same specification as above, but looking at overall numbers of comments a country made rather than the “support” level of the comments as the dependent variable.

The OLS was significant at the 5% level across the board, and was robust to including controls for other foreign debt and distance to China (table 2). In fact, when including all controls, significance raises to the 1% level. The diff and diff with FE and with controls were insignificant, but the coefficients were negative as expected (table 3).

This result begs further investigation. It does appear that there is a clear negative relationship between debt to China and overall fewer comments in China’s UPR (see table 2). However, unfortunately the UPR is only every 5 years and began in 2009. The results of the initial 2009 UPR were not captured in a matrix of recommendations like the 2013 and 2018 UPRs were. By 2013, the trend of Chinese investment was already well advanced globally, so the diff and diff is not able to capture this variation. For future work, we recommend attempting to pull the 2009 UPR data and format it to include in the study. We also would recommend including future UPRs to see if this relationship holds, and if we can use the diff and diff more effectively with more observations to make a causal link.

For graphs and tables go here